The data for each of the companies was read directly from Yahoo Finance using pandas datareader.
pip install pandas_datareader
# Reading the datasets using the given libraries
import pandas as pd
import pandas_datareader as pdr
import datetime
# Setting the start date as 1 January 2000 and end date as 8 December 2020
startdate = datetime.datetime(2000, 1, 1)
enddate = datetime.datetime(2020, 12, 8)
For ease in analysis, a long dataframe was created that stored just the adjusted closing prices of each of the stock tickers. The dataframe was named as ac_price. This dataframe consists of approximately 31000 datapoints.
# Creating a list of stock tickers
stocks = ['FB', 'AMZN', 'AAPL', 'MSFT', 'GOOG', 'TSLA']
# Storing the adjusted closed price data of all the stock tickers into the given dataframe
ac_price = pdr.get_data_yahoo(stocks, startdate, enddate)['Adj Close']
ac_price.head()
# Reading the data froma csv file if the above doesn't work
# ac_price = pd.read_csv("ac_price.csv").set_index("Date")
# ac_price.head()
For analysis, a long dataframe was created that stored the daily returns derived from the adjusted closing prices of each of the stock tickers. The dataframe was named as daily_return.
# Calculating daily returns for each stock ticker and storing in a new dataframe
daily_return = ac_price.pct_change()
daily_return.head()
# Importing the required libraries
import plotly
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import plotly.tools as tls
import plotly.express as px
import plotly.figure_factory as ff
from chart_studio.plotly import iplot
from plotly.offline import iplot
from plotly.offline import init_notebook_mode
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline
import cufflinks as cf
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)
py.sign_in('arushik1994', 'jD7AopX1C1xMEwC6gEBH')
# Creating subplots to compare the adjusted closing prices of each stock ticker
# Each stock ticker has been assigned a unique color line and width
fig = make_subplots(rows=3, cols=2)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['FB'], name="Facebook", line=dict(color='darkblue', width=1.5)), row=1, col=1)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['AAPL'], name="Apple", line=dict(color='grey', width=1.5)), row=1, col=2)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['AMZN'], name="Amazon", line=dict(color='green', width=1.5)), row=2, col=1)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['MSFT'], name="Microsoft", line=dict(color='goldenrod', width=1.5)), row=2, col=2)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['GOOG'], name="Google", line=dict(color='orange', width=1.5)), row=3, col=1)
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['TSLA'], name="Tesla", line=dict(color='darkred', width=1.5)), row=3, col=2)
fig.update_layout(height = 800, width = 1000, title_text="Comparing Adjusted Closing Prices")
fig.show()
A moving average is a calculation that creates a series of means of different subsets of the full data set.
In finance, a moving average (MA) helps smooth out the price data by creating a constantly updated average price. This helps filter out the "noise" caused due to random short-term price fluctuations and therefore, examine trends.
For the purpose of this project, a 100-day moving average was calculated on the Adjusted Closing Price for each stock ticker. A 100-day Moving Average (MA) is the mean of adjusted closing prices of the previous 100 days. A 100-day Moving Average represents its first data point as the average of prices from Day 1 - Day 100. The next data point is the average of prices from Day 2 - Day 101 and so forth.
# Creating Moving Averages column for each stock ticker in the ac_price dataframe
ac_price['FB 100days MA'] = ac_price['TSLA'].rolling(100).mean()
ac_price['AAPL 100days MA'] = ac_price['AAPL'].rolling(100).mean()
ac_price['AMZN 100days MA'] = ac_price['AMZN'].rolling(100).mean()
ac_price['MSFT 100days MA'] = ac_price['MSFT'].rolling(100).mean()
ac_price['GOOG 100days MA'] = ac_price['GOOG'].rolling(100).mean()
ac_price['TSLA 100days MA'] = ac_price['TSLA'].rolling(100).mean()
# Examining each stock's adjusted closing price along with it's 100-day moving average
# Each stock ticker has been assigned a unique color line and width
# The moving average line has been assigned the color red in all the graphs
# Facebook
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['FB'], name = "Facebook Adjusted Closing Price", line=dict(color='darkblue', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['FB 100days MA'], name = "Facebook 100 Days Moving Average"))
fig.show()
# Apple
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['AAPL'], name = "Apple Adjusted Closing Price", line=dict(color='grey', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['AAPL 100days MA'], name = "Apple 100 Days Moving Average"))
fig.show()
# Amazon
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['AMZN'], name = "Amazon Adjusted Closing Price", line=dict(color='green', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['AMZN 100days MA'], name = "Amazon 100 Days Moving Average"))
fig.show()
# Microsoft
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['MSFT'], name = "Microsoft Adjusted Closing Price", line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['MSFT 100days MA'], name = "Microsoft 100 Days Moving Average"))
fig.show()
# Google
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['GOOG'], name = "Google Adjusted Closing Price", line=dict(color='orange', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['GOOG 100days MA'], name = "Google 100 Days Moving Average"))
fig.show()
# Tesla
fig = go.Figure(data=go.Scatter(x=ac_price.index, y=ac_price['TSLA'], name = "Tesla Adjusted Closing Price", line=dict(color='darkred', width=1.5)))
fig.add_trace(go.Scatter(x=ac_price.index, y=ac_price['TSLA 100days MA'], name = "Tesla 100 Days Moving Average"))
fig.show()
# Creating subplots to compare the daily return trends of each stock ticker
# Each stock ticker has been assigned a unique color line and width
fig = make_subplots(rows=3, cols=2)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['FB'], name="Facebook", line=dict(color='darkblue', width=1.5)), row=1, col=1)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['AAPL'], name="Apple", line=dict(color='grey', width=1.5)), row=1, col=2)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['AMZN'], name="Amazon", line=dict(color='green', width=1.5)), row=2, col=1)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['MSFT'], name="Microsoft", line=dict(color='goldenrod', width=1.5)), row=2, col=2)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['GOOG'], name="Google", line=dict(color='orange', width=1.5)), row=3, col=1)
fig.add_trace(go.Scatter(x=daily_return.index, y=daily_return['TSLA'], name="Tesla", line=dict(color='darkred', width=1.5)), row=3, col=2)
fig.update_layout(height = 800, width = 1000, title_text="Comparing Daily Returns")
fig.show()
In finance, correlation is a statistical measure of how two stocks move in relation to one another. The Pearson correlation coefficient (pearson's r), which ranges between -1 and +1, has been used to exmaine the movement of stocks' adjusted closing prices.
If the prices or returns move in a similar direction, the stocks are considered positively correlated. When the prices or returnsconsistently move in opposite directions, the stocks are negatively correlated.
In context of daily stock returns, the stocks in any portfolio should have a low (or no) correlation between them. This allows for a reduction in an investor's losses. This is because if daily returns of one stock are experiencing losses, the daily returns of the other stock would likely be experiencing gains.
# Calculating the Pearson correlation on daily returns of the stock tickers
daily_return.corr(method='pearson')
# Creating a heatmap to explore the correlations
fig = px.imshow(daily_return.corr(), height=700, width=800, title = 'Correlation Among Daily Returns', color_continuous_scale='Inferno')
fig.show()
# Creating subplots to compare Tesla's daily returns with each of the tech stocks
fig = make_subplots(rows=5, cols=1)
fig.add_trace(go.Scatter(
x=daily_return['TSLA'],
y=daily_return['FB'],
mode='markers',
marker=dict(
color='yellow',
size=10,
line=dict(color='black', width=1))), row=1, col=1)
fig.add_trace(go.Scatter(
x=daily_return['TSLA'],
y=daily_return['AAPL'],
mode='markers',
marker=dict(
color='grey',
size=10,
line=dict(color='black', width=1))), row=2, col=1)
fig.add_trace(go.Scatter(
x=daily_return['TSLA'],
y=daily_return['AMZN'],
mode='markers',
marker=dict(
color='lightgreen',
size=10,
line=dict(color='black', width=1))), row=3, col=1)
fig.add_trace(go.Scatter(
x=daily_return['TSLA'],
y=daily_return['MSFT'],
mode='markers',
marker=dict(
color='orange',
size=10,
line=dict(color='black', width=1))), row=4, col=1)
fig.add_trace(go.Scatter(
x=daily_return['TSLA'],
y=daily_return['GOOG'],
mode='markers',
marker=dict(
color='pink',
size=10,
line=dict(color='black', width=1))), row=5, col=1)
fig.update_xaxes(title_text="Tesla Daily Returns", row=1, col=1)
fig.update_xaxes(title_text="Tesla Daily Returns", row=2, col=1)
fig.update_xaxes(title_text="Tesla Daily Returns", row=3, col=1)
fig.update_xaxes(title_text="Tesla Daily Returns", row=4, col=1)
fig.update_xaxes(title_text="Tesla Daily Returns", row=5, col=1)
fig.update_yaxes(title_text="Facebook Daily Returns", row=1, col=1)
fig.update_yaxes(title_text="Apple Daily Returns", row=2, col=1)
fig.update_yaxes(title_text="Amazon Daily Returns", row=3, col=1)
fig.update_yaxes(title_text="Microsoft Daily Returns", row=4, col=1)
fig.update_yaxes(title_text="Google Daily Returns", row=5, col=1)
fig.update_layout(height = 2500, width = 1000, title='Tesla vs Others: Daily Returns')
fig.update_layout(showlegend=False)
fig.show()
From the above visuals, it can be observed that Tesla has the lowest correlation with each of the other five stock tickers. This could be due to Tesla being in a different sector as compared to other five sectors. While Tesla is in the automobile sector, Facebook, Apple, Amazon, Microsoft and Google are classified to be in the technological sector.
It can also be observed that the correlation of daily returns among the five tech stock tickers is higher as compared to each of its' returns' correlation with Tesla's.
Therefore, it can be concluded that an investor should include stocks from Tesla and one tech-company stock ticker in his/her portfolio to prevent losses.
In finance and investing, standard deviation is an indicator of market volatility and therefore, of risk. However, it is important to note that riskier the security or stock, the greater potential it has for return. The higher the standard deviation, the riskier the investment.
Though risk and return share a positive correlation (the higher the risk, the greater the return), there isn't a guarantee that taking greater risk results in a greater return.
The given program compares the standard deviation and return of the given stock tickers.
# Creating a plot to observe the mean and standard deviation of each stock ticker
# Storing the stock ticker names in a list
text = list(daily_return.mean().index.values)
# Assigning lightblue markers with a black border for each stock ticker on the plot according to its mean and standard deviation
fig = go.Figure(go.Scatter(x = daily_return.mean(),
y = daily_return.std(),
mode = 'markers+text',
marker = dict(color = 'lightblue', size = 8, line = dict(color = 'black', width = 1))))
# Presenting the markers in a bordered box for clearer presentation
fig.update_layout(title = "Risk vs. Expected Return",
xaxis = dict(title = "Expected Return", range = (-0.001, 0.003)),
yaxis = dict(title = "Risk", range = (0, 0.045)),
annotations = [dict(showarrow=True, arrowhead=2, arrowsize = 2,
x = x, y = y, xref='x', yref='y', text = i, ax=20,
bordercolor='black', borderwidth=2, borderpad=5, bgcolor='lightblue')
for x,y,i in zip(daily_return.mean(), daily_return.std(), text)])
fig.show()
From the above visual, it can be noted that among the given stock tickers, TESLA's stock is reported to have the highest risk and expected return.
A few of the following reasons can be attributed to the above -
Long short-term memory (LSTM) is an artificial recurrent neural network and a complex aspect of deep learning. LSTMs are used for sequence prediction problems, as they are able to remember patterns for long durations of time. Therefore, they have proven to be helpful in the prediction of stock prices.
The following program applies a LSTM to predict the adjusted close stock price of Tesla, using the past 30 day stock price.
Prior to using the model in this project, sufficient time was spent studying and learning the capabilities of the model through the given source -
First, required libraries, Tensorflow and Keras are installed and imported.
pip install tensorflow
pip install keras
pip install sklearn
# Importing other necessary libraries
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('bmh')
%matplotlib inline
import math
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
For the given program, stock data of Tesla has been used for the given time period - 2015 to 2020. Pandas datareader was used to retrieve the data directly from Yahoo Finance.
# Retrieving the data from Yahoo Finance
TSLA = pdr.get_data_yahoo('TSLA', start='2015-01-01', end='2020-12-08')
TSLA.head()
# TSLA = pd.read_csv("TSLA.csv").set_index("Date")
# TSLA.head()
As the program focuses only on Adjusted Close Price prediction, the read data was filtered to just two columns, Date and Adj Close.
# Filtering the data to include just Adjusted CLosing Price
data = TSLA.filter(['Adj Close'])
# Storing the data values
TSLA = data.values
Then, a variable was created to store the length of the training dataset. The training size has been set to 80%.
# Creating a length variable to store the length of the training dataset; assigned length is 80%.
training_length = math.ceil(len(TSLA) *.80)
Deep learning requires that the data be scaled for optimal performance. Therefore, the data values were scaled between 0 and 1 inclusive.
# Scaling the array data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_TSLA = scaler.fit_transform(TSLA)
Once the data has been scaled, it is split into training and testing.
The training dataset has two parts, xtrain and ytrain.
The training set is a subset of the data set used to train a model. Therefore, \
xtrain is the training data set.\
ytrain is the set of labels to all the data in xtrain.\
It can also be stated that xtrain dataset is the independent training dataset whereas ytrain is the dependent training dataset.
The test dataset also has two parts, xtest and ytest. It is a subset of the data set on which the model is checked. Therefore, \ xtest is the test data set.\ ytest is the set of labels to all the data in xtest dataset. This has been done later.
# Splitting the data into training; creating x train and y train datasets
# Timesteps are 60 as we are looking at the past 30-day prices
train = scaled_TSLA[0:training_length, : ]
xtrain=[]
ytrain = []
for i in range(30,len(train)):
xtrain.append(train[i-30:i,0])
ytrain.append(train[i,0])
Now, the independent train data set ‘xtrain’ and dependent train data set ‘ytrain’ are converted to numpy arrays before applying the LSTM model. Then, only the xtrain dataset is reshaped into a 3-dimensional array.
# Converting the xtrain and ytrain datasets into an array format
xtrain, ytrain = np.array(xtrain), np.array(ytrain)
# Reshaping only the xtrain dataset into a 3-dimensional format
xtrain = np.reshape(xtrain, (xtrain.shape[0], xtrain.shape[1], 1))
Then, the model is built as follows -
First the network is initialized with Sequential class.
Then The LSTM layer is added which is comprised of memory units. This layer consists of input shape which specifies number of timesteps, 30, and number of features, 1, the Adjusted Close Price.
Finally, a Dense layer is added which is used for outputting a prediction.
Setting return_sequences=True, returns the output of the entire sequence to the next LSTM layer.
To implement deep learning, we stack a second layer of LSTM.
# Initializing and layering the network
# Units refer to the output dimensionality; any number of units can be assigned into the LSTM layer
# Just 1 unit is assigned into the Dense layer as one value is predicted
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(xtrain.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
Then, the above model is compiled using an optimizer and the loss is measured via mean squared error. Then, the model is trained on xtrain and ytrain datasets.
Batch size refers to the total number of training examples present in a single batch.\ Epoch is the number of iterations when an entire data set is passed forward and backward through the neural network.
# The network is compiled using the adam optimizer which is an algorithm used in deep neural network models
# The mean squared error is the metric to determine the error rate
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(xtrain, ytrain, batch_size=1, epochs=1)
Then, test dataset is created, which consists of xtest and ytest. The ‘xtest’ and ‘ytest’ are also converted to numpy arrays Then, only the xtrain dataset is reshaped into a 3-dimensional array.
Finally, the model is checked on the test dataset and the predictions are stored.
The root mean squared error or RMSE value measures loss. The lower the RMSE value, the more accurate is the model. A value of 0 would indicate that the models predicted values match the actual values from the test data set perfectly.
# Creating testing data
# Dividing the testing data intoxtest and ytest
test = scaled_TSLA[training_length - 30: , : ]
xtest = []
ytest = TSLA[training_length: , : ]
for i in range(30,len(test)):
xtest.append(test[i-30:i,0])
# Converting the xtest data into an array format
xtest = np.array(xtest)
# Converting the xtest dataset into a 3 dimensional format
xtest = np.reshape(xtest, (xtest.shape[0], xtest.shape[1], 1))
# Calculating predictions
predictions = model.predict(xtest)
predictions = scaler.inverse_transform(predictions)
# Calculating RMSE
rmse=np.sqrt(np.mean(((predictions- ytest)**2)))
rmse
Now, we visualize the actual and predicted values via a graph.
# Plotting the graph
trained = data[:training_length]
actual = data[training_length:]
actual['Predictions'] = predictions
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Adjusted Closing Price', fontsize=18)
plt.plot(trained['Adj Close'])
plt.plot(actual[['Adj Close', 'Predictions']])
plt.legend(['Trained', 'Actual', 'Predictions'], loc='upper left')
plt.show()
# Applying the model to predict a price into the future
# Retreiving the data via Yahoo Finance and filtering to include jut Adjusted Closing Prices
TSLA_price_1 = pdr.get_data_yahoo('TSLA', start='2012-01-01', end='2020-12-08')
# TSLA_price_1 = pd.read_csv("TSLA_price_1.csv").set_index("Date")
# TSLA_price_1.head()
TSLA_price_1 = TSLA_price_1.filter(['Adj Close'])
# Storing the last 30 days prices
last30 = TSLA_price_1[-30:].values
# Scaling the last 30 days prices
last30_scaled = scaler.transform(last30)
# Storing the last 30 days prices into Xtest
Xtest = []
Xtest.append(last30_scaled)
Xtest = np.array(Xtest)
Xtest = np.reshape(Xtest, (Xtest.shape[0], Xtest.shape[1], 1))
# Applying the model to predict the price of the 31st day and printing the same
predictedprice = model.predict(Xtest)
predictedprice = scaler.inverse_transform(predictedprice)
print(predictedprice)
# Checking whether the predicted price is close to the actual price
TSLA_price_2 = pdr.get_data_yahoo('TSLA', start='2020-12-09', end='2020-12-09')
# TSLA_price_2 = pd.read_csv("TSLA_price_2.csv").set_index("Date")
# TSLA_price_2.head()
print(TSLA_price_2['Adj Close'])
To conclude, it can be stated that Tesla is a high risk and high rewarding stock as of today. Therefore, Tesla has often been labeled as one of the most dangerous stocks. However, despite the same, the company attracts a large number of investors. Tesla's innovative cars, disruptive technologies and passion for sustainability has found a huge fan base in millenial investors. Besides, to most millenials, Tesla's appeal lies in the company's ability to benefit the human life through self-driving cars, space exploration, and other developments. As stated often in the media and by many, Tesla is not just a car brand but a lifestyle.
The learning process has been both challenging and interesting. Through this project, I got the opportunity to visualize time series data more effectively. Also, I was able to dive deeper into data mining through the application of machine learning models such as neural networks.